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Abstract 



In this study, heterogeneous bibliographic inforniatidri resources frcir.i geographically distributed 
locations are integrated in an automated, unified and conSoiled way by using abstract data types through 
the Message-Object Model as defined in SmalItalk-80. A unit of modularity call a "class" is developed that 
defines operations to process the data structures encapsulated in the class. The classes focus on processing 
bibliographic citations obtained from heterogeneous on-line bibliographic databases into a meta-form with 
the goal of developing infonnation consistency to simplify further information analysis. Classes developed 
for the bibliographic citation applicadon can speed program development because the data abstractions can 
be used in processing generic infortnadon such as dates regardless of the bibliographic database source. 
Prototype classes are developed to show the e^e^in encapsulating data srructures and behaviors for the 
bibliographic citadbn applicadon. Data abstractions provides a powerful integration technique that allow 
the designer to work with bibliographic citadon objects without being encumbered with the details of 
implementation. 



Keywords: 

abstract data types, message-object model, class message, class methods, Smalltalk-80, 6bjective-C, 
information consistency, database consistency, database reformatting, database integration. 
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Chapter 1: 



Introduction 



In ra^ent yeans, the development of abstraction mechanisms in languages has focused on absSact 
data types to "manage complexity by enphasizing what is significant to the user aiid suppressing what is 
not"rSha84]. This has lead to modem programming languages such as Smalltalk-SO, FlavoS, and Ada. 
Software methodologies have been developed to address engiheeririg concerns in jrequirements, 
specification, design, implementation, correctness, and reliability to reduce cost during the software 
development and maintenance phases. The use of abstractions to logically reduce the complexity of the 
task is aided by modern language mechanisms in that they provide the language constructs to encapsulate a 
logical data ^^^e aiid the operations associated with it. The language consSucts of "Classes" or "Flavors" 
help in the abstraction process: This project is based the Use of abstractiohs to obtain data consistency in 
heterogeneous databases. Our specific implenientation was ^pBed to bibliogt^phic information. Similar 
techniques may be applied to other types of databases, as described in Chapter 7. 

Bibliographic citation databases from heterogeneous information resources are used vvidely in 
research and development work. These databases are often accessed to do a subject search or to prepare a 
bibliography. The citations contained in the bibliographic databases rtlay be large in number and collected 
over a long period of time. This process was done manually before computers became readily available, 
and was tedious and error prone. Today computers are used widely for this task. Modern computer 
automated tools have been developed to assist in such bibliographic processing and are continually being 
enhanced[CjOl85]. 

A research task may consist of accessing several bibliograpuic systems such as DIALOG. iNSPEC, 
NASA/RECON, DOE/RECON, or DOD/DROtS. 
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The respective retrieved citation lists are down-loaded into a user file for post-processing analysis. Each 
database has its own form because of independent development programs and a lack of generally accepted 
standairds. Hence post-processing analysis on a database citation file requires an individualized sofbvare 
processing package for each citation database. Sometimes the user's files are merged if software is 
available to translate the files into a common format. 

To analyze data from the down-loaded and merged files requires data consistency. Hence a 
prototype has been developed to provide the tool to make heterogeneous bibliographic citadon databases 
consistent. For example^ the search for citations within a range of dates is encumbered by the problem that 
dates may be represented in different formats in different databases. Searches on author names are also a 
problem if different databases enter first, middJe or last names in varying formats. The goal is to have one 
tool process the heterogeneous bibliographic citations into a standard form to provide the basis for 
convenient data analysis. 

Significant improvements are made by conceptualizing the problem of data consistency by 
abstractions in terms of Smalltalk classes. Since the information types in citations are broadly similar, 
classes can be developed for each type of information such as "date" or "title". Careful specification of the 
classes can simplify the programmers task since interfaces will be defined, and data and their behaviors 
wiH be understood. Another improvement occurs when future enhancements are built on the classes 
already developed and serve to reduce the amount of new software needed. 

This study shows the ease in developing prototype classes for integrating heterogeneous 
bibliographic citation databases and suggests the basis for the development of additional classes required 
for the complete application. The modularity of software, the inheritance by classes, the encapsulation of 
data stractures and operations, and the use of dynamic binding reduce the task of the software designer and 
developer. Hence the (Sbject-Message abstraction narrows the gap between the concepts and analysis of the 
problem and the notation used in the computer software to solve the problem. 

in the following chapters, we discuss the background, motivation, arid development of abstract data 
types via Smalltalk-80 classes to solve the problem of data consistency in heterogeneous bibliographic 
citation databases. 

-2- 
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Chapter 2 discusses previous methods used. Chapter 3 gives the characteristics of the Message-Object 
Model. Chapter 4 discusses the physical hardware and software methods used to create the Objective-C 
classes for the prototype. Chapter 5 discusses the specifics of the prototype implementation. Chapter 6 
discusses the results of the prototype implementation, and the last chapter discusses future directions. 



Chapter 2; 



Previous Methods for Processing Heterogeneous 
Bibliographic Information 



This chapter gives some background ihfomiation on bibliographic citation databases and discusses 
previous methods for processing the information. 

2.i Why Use Heterogeneous Bt hliographic Informad^^ 

Hall and Brown provide a statistical study of the ou-line bibliographic databases that is the basis of 
this secdbn[Hal83]. Online databases have been available since the 196bs but have mostly been in-house. 
Since 1972, there has been a rapid growth of publicly accessible databases. 

Table I 

Number of Bibliographic References Availahle Online 
in millions 



1968 


1972 


1976 


1986 


1982 


1/4 


3^ 


m 


58 


77 



The current rate of addition is 8.7 million references per year. With duplication accounted for, the 
esdmate is 50 million singular references available for use and six million additions to the reference pool 
made per year. 
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Parallel to the four-fold increase from 1976 to 1982, the growth in on-line use Is estiinated to be six 
foid as seen in Table 11. 

Table II 

Bibliographic iSearches on Public Systems in U.S.A. and Cianada 

in miitions 



li,75 


1977 


1979- 


1981 


1 


2 


4 


6 



There ait^ four particularly predominant database services. They are listed in Table in. Each supplier 
strives for uncommon databases in their service. Neariy 20 percent of die important databases are not 
available from the four services. 



Table lH 

Unique and Common Databases available from major suppliers 



Supplier 


BRS 


DIALOG 


ms 


ORBIT 


Unique 


8 


39 


5 


24 


Common 


28 


56 


27 


28 


Total Number 


36 


95 


32 


52 


Total Percent 


21 


55 


18 


30 



The vast repertoire of iriformatibri makes the access to heterogeneous bibliographic infonnation an 
important resource to a researcher. From Table III, we see that a password to DIALOG gives access to 
fifty-five percent of the databases. An additional password to ORBIT gives a total access to seventy 
percent of the databases. 
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tJp to 1984, more than 2453 citation and numeric data files were available from 362 on-line 
infontiatibri veridors[Cua84]. Scientific disciplines are continually adding to the published set of abstracts 
and citadons. Most on-line bibliographic information is sdll obtained in printed form after an on-line 
search. The vast amount of information needs a tool with a unified view to extract significant scientific 
and technological intelligence. 

22 Description of Bibliograpfdc Citations 

To understand the problems involved in heterogeneous bibliographic citations a simple MEDLINE 
citation is described as it is mounted on BRS. Only six fields were selectively down-loaded. 

Sample Bibliographic Citation 

[AU] Bowry-T-R. Oywang-J. Lumba-M. 

[IN] Department of Human Pathology, Faculty of Medicine, University of Nairobi, Kenya. 

tn] HBV infection: prevalence of core antibody and other markers in urban based black school 
children in Kenya. 

[SO] Ann-Trq)-Paediatr. 1983 Dec. 3(4). P 197-200. 

[LG] EN.. 



[IS] 0272-4939 



The AU represents author, with hyphens separating initiaJs. LG represents language and IS is the 
accession number for the citation in the particular database. IN represents the institutional affiliation of the 
author, TI is the title, and SO is the source. The same bibliographic citation from a different database 
source may be formatted in a completely different way. Inconsistency in the detail field hinders 
information anaIysis[Gol8S]. 



23 Processing of Heterogeneous BBUograpMc Inforn^^ 

With the appropriate administtative requirements fulfilled, a user can down-load bibliographic 
records from a variety of on-line services such as BRS or DIALCMa. Typically, an off-line printing 
follows a search, and is arranged in reversed chronological order. The need for computer based editing 
tools is a natural consequence. Rather than obtaining the down-loaded inforination in stacks of printout, 
the bibliographic citations aie down-load^ to a disk file so a computer can be used for automated 
processing of the information. We observe two problems that exist in local processing of the file. The file 
must be translated into a common form to handle the different database formats for data tags and to handle 
the inconsistencies in the detail information associated with each data tag. 

Tools to develop data consistency are available in most modern database management systems. 
Information consistency within a sffecific bibliographic database may also be augmented by locally 
developed software and procedures, the database administrator can use software tools to constrain data 
entry to meet certain requireirients. The user may be required to enter data strictly in integer fonnat within 
a certain range of values or character format within a certain string le» ^th. Furthermbre, the user may be 
required to enter strings that are pre-defined in a dictionary for that atSbute, such as one of eight 
acceptable colors. We can see at this point that infdrmatidri iriay be entered correctly into a particular 
database in formats that are singulariy defined by the local database administrator. However, there inay be 
inconsistent formats aitidrig the heterogeneous bibliographic databases because of a lack of standards and 
autonomous database development and administration. For example, dates can be constrained in a local 
database to be either May 1, 1985 or 1 May 1985 format. 



There may be additional differencies in upper/lower cases, abbreviationiSi spaces, or punctuation: Hiese 
inconsistencies hinder the automated processing of bibliographic citations in the down-loaded disk file. 
Hence we find in processing a search based bri date ranges, software must be written to handle the date 
discrepancies, or the search will be incomplete. Author names also introduce problems because R. L. 
Smith, Richard L. Smith, and Richard Lee Smith are the names of the same author. If one desires a list of 
articles written by Richard Lee Smith after a certain date, tile tabulation would be inaccurate. 

A recent study on popular *fSbnt-end systems' available on the maSet for processing bibliographic 
citations shews that the user has a limited choice of features such as down-loading and file creation.(i.e., 
SciMate, inSearch, edNn)[Bol84]. Software is not available to address the problem of data consistency 
among heterogeneous databases. 

Goldstein and Prettyman have developed software to process down-loaded citations with the goal of 
incoiix)rating a specified reference format into manuscripts. In their work they encounter the typical 
problems of processing heterogeneous bibliographic citations. 
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They propose transforming each citation into the following canonical format. 



Field # 


Data Element 


Tag 


1 


TYPE . ^ 


TY 


2 


DATABASE 


EE 


3 


TITLE 


TI 


4 


AUTHOR 


AU 


5 


SOURCE 


SO 


6 


flqsTiTLrnoN 


m 


7 


NO. & TYPE MTG 


NO 


8 


MEETING TITLE 


TO 


9 


VOLUME NO. 


VL 


10 


ISSUE 


IS 


11 


MONTH 0OTJRNAL) 


MD 


12 


DAY (JOURNM.) 


DY 


13 


YEAR (JOURNAL) 


YR 


14 


MONTO (MEETING) 


MM 


15 


DAY (MEETMG) 


DM 


16 


YEAR^MEETING) 


YM 


17 


PAGES 


PG 


18 


TOTAL PAGES 


TP 


19 


PUBLISHER 


PU 


20 


PUBL. eiTY 


PT 


21 


PUBL. STATE^ 


PS 


22 


PUBL. COUNTRY 


PC 


23 


PUBLICATION YR 


PY 


24 


MTG.eiTY 


MT 


25 


MTG.STATE 


MS 


26 


MTG.CQUNTRY 


MC 


27 


REPORT NO. 


RN 


2i 


RETRIEVAL NO. 


RG 


& 


ISSN NO. 


SN 


30 


PART NUMBER 


PN 


31 


CODEN 


CT) 


32 


NOTES 


NT 


33 


EDITOR TYPE 


ED 


34 


AVAILABILITY 


AV 


35 


COPYRIGHT YEAR 


CY 


36 


PUBL.AUTHOR 


Ai 



The process is divided into three stages. 

Pre-Processing 

Parsing 

Post-ftocessing 



18 



Steps for pre-processing records down-loaded from heterogeneous databases into separate local Gles 

are: 

1. translate field labels in ail files to a common set; 

2. include fields for, and add database and retrieval system names to all records; 

3. merge all records into one file; 

4. reorder the records into a format that is optimized for further processing; 

5. detemiine Md 2dd the t>1)e of pu^ 

6. standardize the format of the author's name. 

The parsing stage is to separate the complex source field into discrete infonr:ition. Further details 
are found in Chapter 3. 

Post-processing is to further format the information for consistency in the end-product application 
program. The end-prodact could be a statistical analysis based on certain keywords or a bibliography for a 
publication. 

The post-processing tasks are: 

1. conversion for case consistency; 

2. standardize journal tf ties; 

3. correct inconsistencies in format; 

4. expand abbreviated titles; 

5. add missing data; 

6. make linkages between articles and proceedings; chapter and citations. 

The Goldstein and Prettyman work involves knowing the database source and then writing specific 
software for that bibliographic database source. Tlieir proposal for a canonical form for bibliographic 

-10- 
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citation databases is an attempt to develop standardization regardless of the bibliographic citation sources. 

A significant amount of woric has been done in the processing of heterogeneous bibliographic 
citation databases by the Technology Information SystemCflS) of the Lawrence Livermore National 
Laboratory(tLNL). They have been working on technology transfer tiirough computer networks located 
nationally and abroad since 1975 and have developed the Integrated Information System (IIS) that manages 
information and resources on the TIS systemu IIS supports the down-loading and analysis of bibliographic 
citations from heterogeneous database services. A major goal is to pro/ide the capability to extract 
scientific and techiiblc^ical intelligence from the information contained in these databases. To accomplish 
tHs, sofhvare has been developed to process bibliographic citations from the federal informatibri centers of 
tiie Department of Energy (DOE), the Department of Defense(DOD), and the National Aeronautics and 
Space Ad[ministratidn(NASA) as well as the three major U. S. commercial services — Lockheed- 
DIALOG, SD€-0R3iT, and BRS. [Bol84] 

The Integrated Information System (IIS) software package is menu-driven and provides for the 
following bibliographic database options: 



[TRANSLATE] 

[MERGE] 

[STAH 

[ANALYZE] 

[REVIEW] 

[CONCORD] 

[PERMUTE] 

[CROSS] 

[PLOT] 

[DISPLAY] 



translates citations to a standard format 

combines translated files from different sources into one file 

creates a statistical profile of citations 

analyzes biblic^raphical text 

permits on-line evaluation of citations for relevancy. 

creates indexes by author, subject, descriptors, etc. 

issues multi-term statistics of tlie text in selected data fields 

cross-correlates the contents of data fields 

shows the number of citations by year in a graph 

displays the contents of any file on tlie CRT screen 
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'RRANStATE. NffiRGE. DISPLAY and REVIEW do the pfe-processing steps 1-4 mentioned by 
Goldstein arid Pretiyman, ANALYZE. CONCORD, PERMUTE, GROSS, PLOT, and STAT allow the 
user to produce some trend analysis from tlie bibliographic citations that have gone through the 
jpreprbcessing steps. 

Currently, the trend analysis is riot eritirely accurate since the detail information is not entirely 
consistent. A closer examination of the pre-processed files shows dates in the following form: 

1. 1 May 1985 

2. May 1, 1985 

3. 1985. 

4. 5/1/1985 

5. 1985 
6: 5/1/85 

7. May 1985 

8. May, 1985 

The job of producirig a file that is consiirtent is time-consuming and difficult; duplicate bibliographic 
citations are not easily detectei A particular citation usually contains only a subset from the set of data 
tags and different databases may enter certain detail information under different data tags. An example is 
the the following: 

<DATABASE SOURCE> DIALOG NTIS FILE 6 

<TITLE> Online Directory of Databases for Material Properties 

<DATABASE SDURGE> DOE/recon 

<TITLE(MONO)> Online directory of databases for material properties 

-12- 
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The puipose of this project is to further extend the consistency of the detail information found in a 
merged file tliat is the result of dowa-Ioading heterogenous bibliographic citation databases, it is through 
the development of abstract data type SmaIItaik-80 classes that similar types of infofmaddn can be 
standardized, regardless of source. TTie standardization of dates and autl^ors and titles include accounting 
for spaces, punctuation, capitalization, and ordering. 
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Chapter 3: 



The Message-Object Model 



We first establish the foundation for using abstractions in sottware development. Next, we discuss 
the motivation for using abstract data types via Smalltiilk 80 classes to solve the data consistency problem 
in heterogeneous bibliographic citation databases. 

3 J Abstraction Mechahisyns in Modern Prop'amndng 

Recent woric in programming methodology has led to the toiognition of three kinds of abstractions: 
control, procedural and data. A large effort has been e.xpended fii developing a modem programming 
methodology so software is constructed that is easy to Understand, modify, maihtaihi and is reliable. The 
quality of a program depends on the programming methodology used. The effective utilization of the 
methodology is strongly dependent oh the programming language selected for the software development. 
Certain concepts in the methodology may be difficult to put into place if the language does not provide the 
constructs that make the process automatic. The language ^es influence the way a programmer thinks and 
formulates ideas. A good match of the methodology and the language enhances the likehood that the 
methodology will be followei An example would be to attempt to introduce the concept of block structure 
using Fortran 66. A better choice would be Pascal because the language supports block sSuctured 
constructs. While it is true that software can be written in Fortran to simulate the methodology, the job is 
unnecessarily enlarged for the software !mpjementer[Lis74][Lis77]. 
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3J J Software Absjraciions 

What do we mean by software abstractions? We mean that the abstraction isolates the use from the 
iinplemeijiatiori. That is to say, that trie abstraction cm be used without the knowledge of how the 
implementation was carried out, and the implementation can be done without the knowledge of how it is to 
be used[Lis77]. In the early *950s. we see the application of abstractions k\ terms of assembly language 
ratfier than machine language in terms of octal numbers. Thrfje letter acronyms were used instead of an 
octal number that represented the operaior. Operands were designated by symbolic labels rather than 
absolute addresses iii memory. Early languages supported built-in data types like integer and reals. One 
did not think in lenns of binary bits in a computer word at a certain physical location in memory. Later 
type checking aided in appropriate default conversions when a reai number was added to an integer. 
Hence, the programmer was relieved of Icvv levd detail. Procedural and control absSactions were 
dominant Sorting procedures arid square root funcdons could be specified without inquiring knowledge of 
the implementation, and the implementation could be done without biowledge of how they were to be 
used. Later, control abstractions such as do-lobps v^eie made available 60 the concept of iteration was 
abstracted by the language consSuct. AbsSactions were treated as a program organization technique. 
Programmers could define macros and define new data types required by a specific probierii. We note that 
data structures such as stacks and linked lists were first treated systematically in 1968. The idea of 
studying and formalizing programming activity dates back to this tiriie[Sha84h 

What was recognized in early 1970 was that programs were difficult to understand and riiairitain. 
With the infamous gotos that spanned a large number of softvyare lines indiscriminately, the term "spaghetti 
code" evolved and was a familiar occurrence among programmers. Locality was advocated iS terms of if- 
then-else or do-while control consriucts. For a while, extensible languages were promoted because they 
allowed the programmer to add new control constructs and data types to the base language in an attempt to 
add Clarity to the program and make die programmer's tasks easier. This idea became unpopular since it 
was difticult to keep independent extensions compatible, to organize the definitions so related information 
were grouped together, and to find a technique to describe the extensions accurately. 



The need for more accurate specifications was recognized since programmers lypicaily relied on 
procedure headeS and paiameier lists with acconipanying text to define the procedural abstraction. This 
speciCcatidri technique depended on individual styles> and some were well written and accurate, while 
others were vague or out of date, 

3 J . 2 Structured Pro grarnmng Met^^ 

The structured programming methodology w ks developed in the 1970s to address these pmbienis: to 
miafce programs reliable, easy to understand, develop and mainjairu It detsOed phases in software 
development, specified tools needed to assist in tlie process, and established tests and criteria for program 
correctness. Piogram development was to evolve top-down using the idea of absSactions. First the 
statemsnr of the problem was presented and then successive refinements were made until the problem was 
finally solved. iTie idea is to start with a high level abstraction and then progress by problem 
decdmpdi^ition to recognizing subsidiary abstractions. Thir. is v;here we find modem programming 
languages as CLU, Alphard, ADA, Concurrent Pascal, Euclid, Gypsy, Mesa and Modula being developed 
to support the structured programming methodology[Sha84]. 

3,13 Abstract Data Types 

Prbcedui'al and control abstractions were available but the idea of absti'act data types needed 
promotion. Through abstract data types, th^- idea of locality would hence be further extended, making 
programs easier to design, implement, and maintain. Specifications would be easier to write because of the 
encapsulation of the data structures. Data behaviors could be defined only within the abstract data type. 
The requirements of a language supporting data abstractions developed. Linguisdc constructs vi^ere needed 
that irriplemented data abstractions as a unit in terms of data representations and operations on the data. 
The construct would provide a mechanism by which the language would limit access to the representation 
except by the operations defined- Smalltalk is such a language with abstract data types in te. ms of classes. 
CLU has clusters; Ada has packages; Fla^ has flavors. 
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A basic concept is that the operations defined for a class must include all operations needed in handling the 
data structure. Usually the operations include; create^ modify, and access operations. The desiralsiiity of 
classes is that the language takes care of all the interface specifications, the names for instantiations of the 
classes, the assignment, argument passing and type correcSess. 

Essential to abstract data ^^Jes is the primitive library that is provided with the compiler. Here 
typical abstract data types as arrays, AVI^ trees, bags, and dictionaries are provided from which the 
programmer can develop new abstract data types particular to the application. Inheritance is important in 
that new abstract data types that are defined are based on the jjroperties defined in a primitive abstract data 
type. As a matter of fact the abstract data types are usually arranged in a hierarchical tree so that an 
abstract data type inherits all the properties defined between it and the root of the tree. 

Abstract data types are the means by which the human can transform prbblem-ddmairi concepts into 
the computer-domain model. In other words, the separation of specificadon and implementation is the 
desired result. The goal is that program correctness at the abstract level can be ascertained before the 
implementation. The phrases "abstract data types" and "object-oriented programming have been used in 
various contexts, from Simula and its derivatives such as Ada to powerful data description languages used 
in knowledge representation. The meaning we apply is in the Smalitalk-80 conte)cL[Cox84] 

3.2 Object'Oriented Programming 

Object-Oriented Progranmiing replaces the operator-operand concepts that are used in tlie traditional 
computer-domain model. The idea is to introduce a coordination tool that supports change, reusability, and 
enhancements, tlie goal is to fransfer work from the human to the machine and to enhance consistency 
from the human viewpoint. 

Two inajor concepts of Object-Oriented programming are encapsulation and inheritance. 
Encapsulation is an aid in using the system and isolates the objects jfrom the environment except through a 
carefully controlled interface. Inheritance is a aid to building the system. New classes are defined by firet 
inheriting the data and behaviors of older generic classes, then specifying only how the new ones differs. 
The idea is to define the data abstractions so the programming task is made minimal. 
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Now we will define some terms used in a Message-Object prograitiming language such as 
SmalItalk-80. The tenhs object, message, class, instance, and method are all defined in terms of each other. 
We will relate the terms to the Objective-C compiler that is a derivative of Smalltalk-80, and will clarify 
them by examples in Chapter 5. 

3^1 Objects 

Objects are virtual(cdmputer-based) machines. They have some data (private part), a set of 
operations (shared part), and a run-time mechanism for selecting operations on the data that are activated by 
a message sent to the object. They exhibit one of tlieir behaviors when they receive a message. 

3.2.2 Messages 

Messages are sent to objects and are requests to obtain a desired result. The message contains a 
predefined operation(method) to be done on the data structure and are serviced one at a time by the object 
Objects representing numbers have arithmetic operations; objects representing data structures as AVL trees 
create an emjjty tree, add, delete, modify, or count elements. 

3.23 Classes 

A class represents a description of a group of similar objects. A class is the abstract data type and an 
object is an instantiation of it For example the class rectangle deals with the generic group of rectangles, 
but an instance of class rectangle will have specific dimensions for length and width. Binding is done at 
mn-time so there is no static ^e checking at coinpile time. An example would be the class Airay in 
Objective-C. The subclasses BytArray, IdArray, and IntArray inherit properties from class Array. Hence 
an operation as printOri defined in class Array will work dri any of the three subclasses mentioned, 
although the data representations differ in terms of byte. Id, or integer. Also a new subclass defined later 
will also be handled correctly, and class Array does not have to be revised to make considerations for the 
new subclass data type. This is how reusability in data abstractions becomes a major asset in software 
development 
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32.4 Methods 

A method is a descriptidri of how to do an operation and is specific to the class in which it has been 
defined, it resembles procedures and could use class variables as parameters. Methods are written in a 
high level language like Smalltalk-80, Lisp, or C. The set of methods should include all the operations 
needed to work with the encapsulated data, either via inheritance or definition within the class. 

33. Benejtis of Object-Oriented Software 

One basic caveat of object-oriented software is the concept of reliable reusable code. As a matter of 
fact the classes arc called IC's from the engineering concept of integrated circuits. To start with, one uses 
a set of basic classes that form the root of the inheritance tree that can be systematically augmented by 
defining new classes. 

To further understand the problem we are addressings let us look at the Goldstein and 
Prettyman[Gol85] analysis of bibliographic sources from four different bibliographic citation databases: 
MEDLINE, INSPEQ ISIC, and COMPENDEX. 

[MEDLINE] Ann-Trop-Paediatr. 1983 Dec. 17-18. 3(4). P 197-200. 
[INSPEq LASERFOCUS (USA). VOL.19, N0.8. 61-6. 

pSie] eONffUTER 9(3):1 1-12 

[COMPENDEX] 

a) Electronics v 56 n 7 Apr 7 1983 p 155-157. 

b) IEEE Trans Magn v Mag-14 n 5 Sep 1978, fi^^'mRMAG (Int Magn) Conf, 

Florence, Italy, May 9-12 1978 p 964-965. 

The parsing of the citation source is a major task in arriving at the information in the canonical form 
suggested. It cannot be automated fully, and is iterative due to inconsistency in the data, addition of new 
words to the authority dictionaries, and new valid acronyms, enSes and words. 
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Goldstein and Prettymah give ah accompanying parsing structure for each of the above citation 
sources. 



[MEDLINE] [tide] *[year]*[nionth] *[day(s)] *[voI]([issue]).*P*tpages]. 
[iNSPjBCj [title]([country]).*VOL.[volunie],*NO.[nuniber].*[pages]. 
[ISIC] [title]*[volunie]([issue]):[pages] 

[COMPENDEX] 

a) [title]*v*[vobme]*n*[issue]*[month]*[day(s)]*[ye^ 

b) [title]*v*[vblume]*h*[issue]*[month]*[days]*[year],* 

[conf. name],*[city],*[country],*[nionth]*[day(s)]* 
[year]*p*[pages]. 

One observes there are classes that are common to the different sources. As a matter of fact, the 
tasks involved in prclcessihg for data consistency of tide, volume, and date, are similar regardless of the 
database origin or the citation source. There may be variations in case, punctuation, abbreviations, and/or 
format. We see date specified as Sep 1978 or May 9-12 1978 in the COMPENDEX sources. The goal of 
this project is to develop sorne prototype classes that augment the set of generic classes to provide the 
absfract data types needed to produce data consistency in citations from heterogeneous bibliographic 
databases. 
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Chapter 4: 



Prototype Development Environment 

This chapter describes the physical hardware and software methods used to implement the prototype 
classes to process heterogeneous bibliographic citation databases into a consistent form. 

4.1 Computer System 

The work was started on the LLNL Engineering Research Division (ERD) VAX 11/780 using the 
VMS operating system since it was the only installation with the Objective-C compiler at LLNL at the 
time. The parser desvelopment using the Unix tools LEX and YAGe was done on the Tektronix 6205 
workstation. The parser modules were sent over the network to the VAX to be compiled by Objective-C 
along with the prototype class modules to minimize use of the resources on the VAX. With limited system 
resources on the ERD VAX, the work was later completed on the LLNL Technology Information System 
(TIS), which meanwhile acquired the Objective-C compiler. Their VAX 1 1/780 uses the UNIX operating 
system BSD 4.2; certain VMS program lines needed for compatibility with Objectrve-C were removed. In 
general the environment was simpler for development work since the \^S port for the Objective-C was 
still in progress whereas the port for Unix BSD 4.2 was complete. 

42 Software Development Tools 

The Objective-C compiler from Productivity Products International in conjunction with the C 
compiler was used to implement the Object/Message model prototype for bibliographic citation databases. 
The Unix tools Lex and Yacc were used to develop the parser generator, and the tool Make aided in 
software development. tPPI85] 
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42 J OhjBctive-C Cvmpikr 

The Objective-C compiler is based on the Smalltalk-80 Message/Object Model. The syntax for 
developing classes in Objective-C resembles tlie Smaiitalk-SO language but differs significantiy in that the 
class methods are defined using the C-language. Tlie Objective-C compiler is a preprocessor that produces 
C source that is then compiled. The pneprocessbr produces Class and Phylum files that are information 
repositories and form the basis for inheritance and encapsulation for the classes. 

Smalltalk-80 is the result of 14 years of research and development by the Software Concepts Group 
at Xerox PARC. It is based on a software environment contained entirely within a workstation with special 
hardware to improve performance by orders of magnitude. The Smalltalk-80 environment solely uses the 
Smalltalk-80 language and provides the software person with a repertoire of basic classes. The 
environment includes utilities usualiy provided by the computer operating system, such as the text editor, 
compiler, and debugger. The environment makes extensive use of graphics windows, pull down menus, 
and a pointing device so the user can work on several views of his work in progress. To change text under 
software development^ the user points at the line, edits it, issues the compile command, removes syntax 
errors, tests the software, and then compiles and links the new software into the system. All this is done 
without changing "modes" for editing, compiling, filing or executing. 

The Objective-C compiler is different in that it is one of the many tools the software developer can 
add to the utilities offered by the operating system. I: is available in the VAX VMS operating system 
environment as well as computer systems with the Unix BSD 4.2 operating system. It is is a preprocessor 
to the C compiler and adds the basic Smalltalk-80 concepts of classes, objects, messages^ encapsulation and 
inheritance. Objective-C is an object oriented programming language layered on top of C and allows one 
to use it in addition to the existing software and hardware. 
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Diagram of Compilation Units[PPI85] 
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Objective-C Class Libraries 

Included with thie Objective-C compiler package are the Basic Class Library and the Foundation 
Glass library that establish the root of the hienu-chy of reusable classes from which classes for the sjpeclfic 
application are developed. Classes developed for the application inherit properties of classes between the 
root and itself. The hierarchy of classes provided with the Objective-C compiler are presented graphically 
in appendix A. 

The Basic Library contains the classes Nil, Object, Array, IdArray and String. The root of the 
inheritance hierarchy is class Object that points to the Nil class. Every object inherits all the Tiethods and 
instance variable available in class Object. Class Array is detailed to give an idea of the methods this class 
supports. Array is a superclass of several classes that support indexed instance variables. It has an 
instance variable capacity that records the units of elements of the array. Methods are defined for instance 
creation with n-elements that may be initialized from an argument list or hot. Methods are also defined for 
copying, inquiring on capacity, printing to an I/O device, comparing and hashing, and notifying on bounds 
violations. 

The Foundation Library contains the classes Assoc, AVLDict, AVLTree, Bag, BytArray, Cltri, 
Dictionary, IntArray, Cltn, Dictionary, IntArray, OrdCItn, Point, Rectangle, Sequence, Sets, Stack and 
Unknown. 
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Diagram of Hierarchy of Glasses in Basic and Foundation LibraryEPPI85] 
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The implementer of an Object/Message application must be famiiar with the available classes to 
apprdpriateiy use the inheritance properties inherent in the class hierarchy. In the prototype 
implementation, the class Object was used. In the discussion of future work in Chapter 1\ the development 
of other classes are described lo support the task of creating consistency in the heterogeneous bibliographic 
citation database. 



422 UnixTooirLex 

The Unix tool. Lex, is a program or module generator. The basic model for Lex is based on the 
theory of regular expressions[Ahd74]. It generates a module that is a deterministic finite state automaton. 
The input to Lex is based on user specified rules tliat are in the form of regular expressions. Regular 
expressions are rules for specifying character strings to be matched and include operator characters to 
account for repetition of strings, optional or required occurrences of strings, and the ordering of strings. 
The user may associate a procedure with a rule so furdier processing is done when a rule is matched. For 
example, if a rule in the form of a regular expression expects a number, the associated procedure may 
verify that the number is in an expected range and flag an error if it is not valid[Les75]. Lex generates the 
module that does lexical analysis on the input character stream consisting of the detail information 
associated with a data tag in a bibliographic citation. The tokens and optional values are passed lo the 
parser. 

423 Unix Tool: Yacc 

Yacc is a tool that generates a program or module called the parser. Yacc is based on Context Free 
Grammars using Backus-Naur Form(BNF) descriptors to specify the parser that accepts the language. The 
formal discussion is found in [Aho74] and a user's manual in [Joh75]. The input to Yacc are user specified 
gramhlar riiles and optional procedures' to be invoked when the grammar rule is recognized. The parser 
includes a call to the lexical analyzer that passes tokens and optional values recognized from the input 
character stream. 
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The parser does a syntactic analysis and does the associated actions if the input satisfies the grammar rule. 
For the prototype the grammar rules include all die legal variations in the detail information for a data tag 
in a bibliographic citation. 

4.2.4 Unix Tool: Make 

The Unix tool Make is a software management tool that allows dependencies to be specified by the 
user among software modules. Changes to a source file are automatically detected and trigger the 
appropriate actions specified in the dependency rule. For example, modifications to a source file could 
trigger recompilations of other dependent source files. 

43 Summary 

The software proto^ was developed in the Unix BSD 42 software environment, using the Unix 
tools Lex, Yacc, Make and fte Objective-e compiler. The C compiler was used id develop the software. 
The next chapter discusses implementation of the prototype and how the tools are used in the 
impiementation. 
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Chapter 5: 



Prototypa Implementation 



This chapter inttoduces the basic data abstraction mechanism in Objective-C the class. A prototype 
for processing heterogeneous bibliographic information is descnbed to siiow how the abstraction is used in 
program design and how it is used and implemented in Objective-C. A system overview that details the 
major steps in producing the prototype is diagramed. 
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5.7 Sources of Data 

The source of data could be the result of a session by a user at a terminal making queries of an on- 
line system such as the Dialog system that involve the search of bibliograjjhic citations on a topic. The 
ougjut is usually in the form of a disp ' y of the retrieved citations and may be followed by a more complete 
printout of the citations. In our case, the facilities at the LLNL Technology Information System (TIS) were 
used to obtain bibliographic citations on the subject of "Computer Gateways and Networks" from the six 
following on-line database services: DTIQDROLS-TR, DIALOG NTB FILE 6, BRS, DOE/REGON, 
NASA/RECON, and SDe/UBRARY and BsiFORMATION SCIENCE ABSTRACT. An on-line session 
with each particular database service was used to capture the information into a local file. The citations in 
the local file was fransiated into the TIS standard form for bibliographic citations. The six local fi'2s were 
then merged into a single file so that post-processing analysis could be done on a single file. A sample of 
the merged file is included in Appendix D. 

Each bibliographic citation consists of an average of twenty fields of information. Each field begins 
on a new line and consists of a data tag delimited by left and right angle brackets (<,>) and ending with the 
descriptive information. In database terminology, one can consider the data tag as a field label and the 
desciiptive information as the field detail. 

5.2 Reformatting the Detail Informatioh for Consistency 

On closer examination of the bibliographic citations in the merged file one finds similar types of 
information may be represented in differing formats if they come from different database sources. There 
may be varying formats within a database for items coming from different publication types. For example, 
"<^ATE> 1985." appears in a BRS/National Library of Medicine Database record, whereas, "<DATE> 
Aug 1984" appears in a DIALOG NTIS FILE 6 citation. Another problem is that "<1[TrLE> PLURIBUS 
SATELITE IMP DEVELOPMENT MOBILE ACCESS TERMINAL I^TWORK" appears in upper-case 
in the DTICy©ROLS-TO citation but "<TITLE> An on-line directory of databases for material properties" 
appears in lower case except for the first word in the NASA/recon citation database. One can make the 
observation, however, that similar "classes" of information occur in bibliographic citaticns. 
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The task of reformatting the detail information for consistency is a complex job. The detail 
information from different database sources may appear with a different data tag: An example is 
"<T1TLE> Post-processing of Bibliographic Citations from DOE/RECON, NASA/RECON, and 
DOD/DROLS. Revision 1." from the DIALOG NTIS FiLE 6 whereas the same citation in the DOE/recon 
database has "<TrrLE(MONO)> Post-processing of Bibliographic Citations from DOE/RECON, 
NASA^ECON, and DOD/DROLS. Revision 1." The task of consistency may include a cross correlation 
of information. If the title is not available with the <TITLE> data tag, the information may be available 
with the <TiTLE(MONd)> data tag. Hence a duplicate may be detected and removed. Typically, one 
may request a yearly count of articles written on a subject to asceitain the emerging importance of research 
in the area. We pointed out in Chapter 2, they estimate that thirty-five percent of the bibliographic citations 
are duplicates[Hai83] and so the accounting of duph'cates is important 

5.5 Program Design Abstractions 

Consider the merged file as a data abstraction called in-stream« and the data abstraction called out* 
stream that will contain bibliographic citations in a consistent format. We will heed procedural 
abstractions that indicate when in-stteam is empty, or determine the next data tag and data field pair. We 
can consider each data tag and data field pair as an abstraction. Hence, we can arrive at abstract data types 
for "date", "title", "author", and etc. that are based on the data tags found in the merged file. 

The <DATE> abstraction is presented with details for its implementation. The bibliographic data 
tags such as <DATE>, <AUTHOR>, or <TlfLE> are handled as left context operators. They trigger 
environments that are very dissimilar. On closer examination, the information associated with <TITLE> is 
considered as a string, whereas the information associated with <DATE> is considered on a word basis, 
where a word is any nonempty sequence of alphanumeric characters. Adjacent woids may be separated by 
non-alphanumeric characters as space, punctuation, or newline. Hence the lexical rules and actions must 
be specified separately for these two different environments. In looking at the <AUTHOR> and <DATE> 
detail information, the parser rules and actions must be specified individually also. Ah author June E. 
Smith has a first name of "June", whereas June should be handled as the sixth montn if it is a date. 
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A discussion oh handling of left context sensitivity is described in the Lex reference tLes75]; Once the data 
tag has been identified, then separate lexical and parser routines associated with Lex and Yacc rules are 
called to process the information. We can think of Lex and Yacc as procedural abstractions in the 
development of our prototype class. The Unix tools Yacc and Lex produce C modules of advanced 
algorithms in a convenient form that can be easily integrated into the prototype application program. These 
program generators do special jobs based on user specifications that are easy to update. Yacc produces the 
module "yyparse" and Lex produces the module "yylex". The user can insert C code before, vvithin, and 
afier the call to either module to add a large amount of flexibility. The modules generated are special 
purpose and have excellent performance in terms of time and space. They save the user from writing their 
own C code and hence frees the programmer from details tliat are conceptualized as procedural 
abstractions. 

5.4 The Prototype 

To show the ease in creating Objective-C classes, the prototype for the Date class is described. The 
prototype consists of the Lex and Yacc specification files, the Date class data absSaction, and the main 
program module. The tutorials on Lex and Yacc were helpful in developing the specification files[Bel78]. 
5.4.1 Lex Specification File 

The general format of Lex input is: 

{definitions} 
%% 
{rules} 
%% 

{user routines} 
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The definition section is: 
%{ 

#include "objc.h" 
#include '^^y.tab.h" 

#define MON(x) { yylval.lex«x; return MONTH; } 

«(N,ColJection,Primitive) 

%} 

The include file "bbjch" contains most of the standard definitions for the user of the Objective-C 
compiler. The file contains various C types such as STR for string, SEL for selector, BOOL for boolean, 
lOD for UO descriptor and SHR for the shared part of an object The include file y.tab.h is created by Yacc 
and contains the tokens used for communication between the lexical analyzer and the parser. The macro 
MON(x) is defined to assign a value to yylvai.iex that is returned to the parser. Values returned by the 
lexical analyzer and associated action procedures are integers by default The rules to Yacc can define 
other types that the parser tree handles so the stack properly carries out the reduce and shifts to determine 
an accepting state for the statement being parsed. The Yacc discussion covers the union of types that 
account for the suffix ".lex". The last statement is an Objective-C declaration for the Phyla fiies. 

The rules section consisting of regular expressions is: 
%% 

|jJ]anr.*'luary)? MON(l); 
[dD]ec("."|ember)? M0N(12); 
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tO-9] 



{yyivai;lex = yytextiOj - *d* ; return DIGIT;} 



[] 



{;/* delete blanks */ } 



{ return EOL; 



} 



tt n 



{ return EOt ; } 



In the regular expression '0J]ahr."|uary)?*, the months are allowed in different forms, i. e. jan, Jan.. 
January Jan, Jan., or January. The macro MbN(x) is the action statement where the value returned is an 
integer, that is 1 for January, 2 for February, and etc. The value is stored m yylval.lex, and MONTH is the 
token retaied. Tlie characteJ^ 0 through 9 arc recognized by the regular expn^sion [0-9] and the action is 
to return tlie integer value for the character representation and DIGIT for the token. The regular expression 
[ ] deletes blanks since there is no action statement. The regular expression "\n" recognizes end-of-lines 
and returns the EQh token. The regular expression recognizes any other character and the action 
statement returns the single character. 

The last section defines procedure "date(month,day,year)" for checking that the month is in the range 
1-12, and the days for a month are correct. The leap year is taken into account on the days of a month. 
Terse error warnings are included that could be changed to more sophisticated error recovery actions. See 
Appendix C for the details. Hence the lexical analyzer module, yylex, should be able to recognize the 
tokens in the eight variadons for "date" that are tabulated in Chapter 2. 

5.42 Yacc Specification File 

We now describe the specification file that is input to Yacc to generate the module yyparse. The 
general form looks like: 



declarations 



%% 



rules 
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programs 



The declaration section is: 
%{ 

#include "objc.h" 

« (N, CdlliBCtiori, Primitive) 

extern id daieOfaj; 

%} 

%union { 

short lex; 
id obj; 

} 

%Start prog 

%tokeh<lex> DIGIT MONTH 
%loken<lex> EOL 
%type<lex> number year day 
%^l>e<obj> DaieStmt 

In the declaration section we have the include file bbjch and the phyla declaration that were 
described in the previous section on Lex. The external declaration of the instantiation of the Date class, 
dateObj, is required since dateObj is created in die main program. The union statement defines the two 
data structures on the pairser tree, the "lex" integer data structure and the Objective-G "obj" id data 
structure. The goal symbol, prog, is defined by the %Start statement, and the legal lexical tokens that yylex 
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recognizes are DIGIT, MONTH, and EGL. Number, year, and day are parsed by yyparse and have the 
"lex" integer data structure. The DateStmt has tlie "obj" id data suucture. 

The rules section is: 
%% 

prog: DateStmt E0L { exit ();} ; 
DateStmt: MONTH day '/ year 
{ 

date($l, $2,M); 

$$ « [dateObj mo: $1 da: $2 yr; $4 ]; 
[datedbj print]; 

} 

I---} 

day: number; 
year: number; 

number DIGIT j number DIGIT {$$ =^ 10 ♦ $1 + $2; }; 

The rules section specifies the BNF grammar for parsing the legal forms of date. The date procedure 
checks that the number of days is within the correct range for the month, with leap year taken into 
consideration. 

The following statement: 

$$ = [datedbj mo: $1 da: $2 yr: $4]; 

stores the month, day, and year values in the object, dateObj. The dbjective-C message expression is 
contained between the pair of square brackets([...]). The message is sent to the receiver, datedbj. There 
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are three keyword selectors, irid, da, aiid year, that consist of a string of characters ending in a colon 
character, the arguments to the keyword selectors are $1, $2, and $4 that are obtained from the paree tree: 
This is an invocation of a method defined in the Date class and is a behavior in addition to the instance 
methods that Class Date inherits from the Object Class. 

[dateObj print]; 

The print method is defined in the Date class and defines a behavior for printing the values stored iii 
the datebbj object for month» day and year. The user simply invokes the print method and is not 
encumbered by the details of the data structures of month, day, or year to print the information correctly. 
In contrast, the Fortrari prograitimer must know whether the month, day, or year may be in ascil, octal, or 
integer format to select the proper conversion specification in the "Format" statement. The proper 
definition of the methods in a class should encompass the create, modify, or reply so that the user's 
requirements in working with the class object is complete. 

The program section is the last section and contains an error diagnostic that prints a warning to the 
user if the input can riot be parsed by the grariiriiar rules coritairied in the input specification file for Yacc. 
One may observe at this point how terse the software is to do all this work. The extraneous characters for 
space, /, and variations in the date format are handled with a minimum amount of software. The values for 
month, day, and year are stored as instance variables into the object, dateObj, through the method defined 
within the class Date, and the print operation is easily invoked since the details are encapsulated as a 
method in the class Date. 

54 J Date Class 

The Date class is defiried iri the source code file, "date.ni". The declaration section has the 
Gbjective-C include file, objch, and the Yacc include file, y.tab.h. Next, the declaration for ascii 
representations for month is included for the print method. 
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The following statement: 

= Date:Object (N,Collection,Priniitive) 



reflects that the Date class inherits properties from the Object Glass, and the Date class will be included in 
the writable phylum file "N". Also, the bate class may use the classes in the Objective-C libraries 
Collection, and Primitive. The instance variable are declared to be integer for month, day, and year, and 
are called mdn, da, arid yr respectively. The first method prefaced with "-mo: ..." stores the values In the 
instance object, llie next meSod denoted by "-print prints the date to the terminal. The print method 
will test for the default values of -1 and vary the printout. The three sample printout fonns are: 

1 May 1985 

May 1985 

1985 

5AA Main Module 

The main program contained in the file, "mairi.m", begins with the include file for the C compiler 
standard I/O library, stdidh, and the Objective-e include file, objc.h. The phyla declaration statement for 
the main program follows. The externals are declared in addition to the instance object, dateObj. The 
main program sets the output to be the terminal that is the Unix standard output device. 

The statement: 

datebbj « [Date new] ; 

creates the object for the Date class. Since the method "new" Is not defined in the bate Class, the method 
is inherited from the Object Class. The prompt ">" is printed at the terminal and then the input is expected 
from user at the terminal so that it can be parsed and have its values for month, day, and year stored into 
the dare object just created. The print method is then invoked to verify the proper values are stored In 
dateObj for month, day, and year. The last two statements declare the classes and phyla that can be used in 
this application program. 



Chapter 6 

Summaiy and Results: 



The intent of the pro^^pe implementation is to provide a programming example of the Class data 
abstraction mechanism of Objective-C as apph'ed to the Date class to obtain data consistency in varying 
forms of dates that are contained in bibliographic citations. Through a simple example, features of the 
abstraction ma:hanism in Objective-G have been presented. The Unix tools, Lex and Yacc were used to 
develop the procedural abstractions, yylex, and yyparse, that do the lexical analysis and syntactic analysis 
on the varying date forms. Eight variations of dates consisting of month, day and year were established in 
the dateObj object for the Date class. With the instance variables set to specific values, the print method 
could be invoked to take care of the task. The private data and daca access methods are encapsulated 
within the Date class, and requires that the user communicate through messages to (he object to elicit the 
behaviors desired. 

The bate class is an elementary example to show how other classes for the bibliographic citation 
database can be developed for accomplishing data consistency in the numerous fields in a bibliographic 
citation. The Date class can easily be extended to included more methods, categorized as setting, inquiring, 
performing arithmetic and printing. 

Setting: 

1. -setmo: aMonth set the month 

2. -setda: aDay set the day 

3. -setyr: aYear set the year 



-38- 



47 



Inquiring: . 

1. -getmc: aMonth repiy the month 

2. -getda: aDay reply the day 

3. -getyr: aYear reply the year 

Performing Arithmetic: 

-Julian reply the Julian day 

2. -dayofyear reply the nth day of year 

Printing 

1. -printmo reply the month 

2. -printdy reply the day 

3. -printyr reply the year 

The goal is to develop a comprehensive Date class to simplify the task of consSicting reUable 
software that is easy to understand, modify, and maintain. Tliis Date class wiil be part of the Class Library 
that is accessed by application programmers who will rely on the skill of the designer who develops the 
abstraction. The classes must be defined such that the behaviors of the class of inforination is fully definei 
Hiese include the create, modify and reply operations. In the event that additional behaviors arc necessary, 
the concept of abstraction mechanisms in the programming language as Objective-C will guarantee that 
software will not have to be re-examined or re-writteh because of the change. 

We briefly describe how the <AUTHeR> and <TITLE> classes can be defined and used in the 
application for data consistency in heterogeneous bibhographic citation databases. The main program is 
expanded to examine the in-sS*eam of data and look for the "<AUTHOR>'* or "<TnXE>" data tag. This is 
easily done since the data tags are enclosed in the left and right angle brackets. The characters following 
the right angle bracket are saved in a buffer until a left angle bracket is detected. This buffer of characters 
is then passed as data input to the parser developed for the particular data tag information. 
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lii the TITLE data tag the Lex specification file will have the action statement convert the text to 
upper-case for consistency, and then will store the title into the object 

yylvaLobj = [String str: yytext]; 
return STRING; 

The Yacc specification file will contain the action statement* 
$$ = [iitleObj str $1 ] ; 

In the case of the AUTHOR data tag, the buffer of character captured afta* detecting the Author tag 
is passed to the Author parser that has BNF specifications to handle the variations in author names. The 
author list could be saved in the Set class. The creation of an Author object could include an initialization 
that would givej a wild card character like "*" for the first or middle name in cases where the names are 
missing from the input stream. The methods defined for the author class could treat the names as wild 
cards when a match is required. 

The next logical development is to define a citation object that contains the Author, Title, and Date 
Objects as a related triple. 

extern id String, Set; id citationObj; 

citationObj =» [self with: 3 
[dateObj str]; 
[titlebbj str]; 
[authorOjb str];]; 

Methods could be defined to create, add, delete, or modify a citation, in addition to printing the citation in 
"pretty" forms for easy user viewing. 

-40- 

49 



Hie prime idea in defining classes for the heterogeneous bibliographic citation databases is to present the 
application programmer with abstractions tliat handle the data types involved, and include all methods to 
process the abstract data types. Hence the objects are the entities that are handled by the; application 
programmer to reduce the details Sat must be remembered. The particular class should characterize the 
behavior of the data entirely. If not, additional methods may be added to the class definition. Indeed, even 
if tiiis is done, software that has been written based on the former class definition may not have to be 
rewritten unless it accesses the new features in the class. The underfying physical structure of the program 
is taken care of by the physical interfaces used by the ObjecUve-C compiler. The basic actions in 
programming the application are assignment statements that create objects and invocations of class 
methods through messages to the objects to exhibit behaviors. 
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Chapter 7 



Discussion and Future Directions 



In recent years a variety of powerful generic tools have been created. Database Management 
Systems(DBMS) and Spreadsheets are examples. They gain their power from the ability to operate on 
various data. They provide the generic operations of create, modify, and output. We have attempted to 
create the tool for data conversion. This study was ies trie ted to bibliographic citations to see how far the 
idea of a generic library tool can be extended. The development of the generic library tool requires the 
definition of classes which the application programmer incorporates into user software. The concept of 
abstract data types via classes can be extended to Database Management Systems. If one considers the 
relational model, then the relations in the form of tabjes can be considered the data structure of the class. 
The operations of retrieve, ufxlate, and append with qualifiers can be considered the class methods. This 
absfraction is a convenient one for the application programmer since tables of information are a common 
occurrence. But a detail look at the physical implementation of the data structure may be complex. The 
storage and access mechanisms may be based on hashing algorithms if the data are sparse and have a 
balanced distribution. B-trees may be used witli jinked lists for fast searches. Here the user is relieved of 
the complexities that are. left to the Database Management System implementers. To access the reladons 
the user relies on the query language that allows operations on the relations. In this same regard, the person 
developing the classes for an Object Oriented application must provide the application programmer with 
the necessary cljisses to do a job. The classes must be general enough to handle application programs that 
have not yet been defined This is what a good Database Management System provides, and is what the 
class library for the application should provide. Of course. Database Management Systems are always 
being enhanced to do a better job for the user, and it is expected that the class library will be improved with 
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time. What is important is that the user will not have to rewrite any software that has been developed: 
Even if the underlying physical structure is changed to improve speed or space, the user need not be 
concerned, and all the benefits will be automaticaHy gained. One can now readily understand the strength 
in using abstractions. Through Object Oriented Programming the abstraction mechanism found in 
Database Management Systems and Spreadsheets can now be extended to programming languages through 
abstraction mechanisms provided in languages like Smalltalk-80 and Objectlve-C. 

This project has demonstrated the feasibility of establishing data consistency in heterogeneous 
bibliographic citation databases through data abstractions, called classes. PutL^re work involves specifying 
and implementing the full set of classes for this application. With the classes in place, the application 
programs can be written to fiither the data consistency goal. 

We have discussed the bilio-citation object consisting of the title, author, and date objects. The 
objects associated within the citation object should be expanded to include the necessary elements for 
identifying a bibliographic citation. This requires the establishment of a canonical form for a bibliographic 
citation. A study of the bibliographic citation format from different sources shows that the data tag names 
are diverse and many are singul^. For example, the DOE/RECON database uses "<PAGE NO> 17", 
whereas the DTIC^ROLS-TR has "<PAG1NAT10N> 30P", Goldstein and Prettyman have proposed a set 
of 36 fields for the citation canonical form arid it appears in chapter 2. They propose two character data 
tags, such as PG for the number of pages in the reference. Their canonical form is based on bibliography 
preparation. The data fields for the general case heeds to be studied and proposed. On a cursory glance, 
the expanded canonical fonn should include "AB" for abstract, arid "KW*' for keyword descriptors. We 
note singular data tags that probably are only meaningful to the local bibliographic database such as 
"^IMITATION edDES> 1", can be excluded frdni tlie canonical form of the citation. With the data tag 
and associated data elements defined for tlie canonical form of a bibliographic citation, the definition of 
classes for data consistency cari proceed. The Date class can be re-used for the journal date, publication 
year, copyright year, and the ineeting date. The definition of a Location class is appropriate for the meeting 
location, pmbUcation location, and author location. This class should access an abbreviation dictionary to 
produce a consistent form of the location. 
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If the location is listed as London, then London, England should be substituted, the location US, U.S.A., 
or United Stales should be made consistent in the same fashion. Warnings should be included for data not 
found in the dictionary, so that it may be updated with new entnes. The standardization of publication 
titles can be added to a iSource class. Certainly, the conversion for case consistency in a character strings, 
and the expansion of abbreviations should be included in the class methods. Alternate names for people or 
institutions could be accessible from a dictionary to further aid in. data consistency. We note that the 
Dictionary class is available in Objective-C and can be incorporated into an class. 

A future expansion should include the post-processing tasks m terms of the classes defined in the 
application tool library. Methods could be included to "pretty-print a bibliographic citation", to analyze 
bibliographic text, to display the citations on the CRT screen, to plot the statistical information on a graph, 
and to do cross-correlations on the data fields. The convenient tools of Unix can be incorporated into the 
classes since Objective-e is designed with the use of Unix tools in mind. We have seen how the Unix tools 
Lex and Yacc were incorporated into the Objective-G program. 

The procedure of establishing data consistency in a heterogeneous bibliographic citation database 
through the definition of absfract data types can be extended to other heterogeneous databases. The 
restriction ir that the information in the heterogeneous databases derive from a common base, as in 
bibliographic citations. Hence for a relational database where a relation is employee, a field in the relation 
is name, and its detail information is John Jones, the data tag could be <employee.name>, and the detail 
field would be John Jones. TTie existence of a data tag and and an associated detail field in the database 
establishes the reuse of the data abstractions created for tlie bibliographic citation database. 
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Appendix A 



Hierarchy of Objective-C Classes - ©class, @phyia iPPI85][Eag85] 
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Hierarchy of Objective-C Classes (continued) 
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Object 
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Object 

OphyJa^ , 
Primitive 



Hierarchy of Objective-C Classes (continued) 
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Objective-G Base Tree - methods [PPI85][Eag85] 
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Objective-C Base Tree (continued) 
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Objective-C Base Tree (continued) 
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Objective-C Base Tree (continued) 
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Objective-e Base Tree (contbued) 
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width: 
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Gbjective-e Base Tree (continued) 



Sequence 



Olicct 

fr«e_ __ 
iaitulistt 

pCMwAs: 
niftdlVom: 

iiGnph: 

ci.paci1y 

copy 

d*fp_Cppy 
d«fcrib« 

doefNotRacogniM : 
arroR 

fiaali 

IdOiSTK: 
iiCopyOr: 

ItKijuJOl; 
IfManibarOf: 
isS&mo: 
nmna 

notEquaJ: 

notlxnpiementad 
hbtSAina; 
peHbrm: 
;-Tfbim: with: 
perform: with: with: 

prinfOh:. 

TMpoudaTo; 

th'^uIdKstlzc -naant 

r: •• _ ._ _.. _ . 
> a "DC Clespdniti ' '. : uty 



flnt 
&ta 

taCdpyOn 
naxt 
c-ran . 
rawi&d 

- String 

ndxVarSisa 

hdxVirType 

haw 

haw: - 

flpriht^ 

afn 

Mlnt__ 
aiLong. 
c^f&city 
c&pacity: 
cUarAt: 
cbarAt: pht: 
cozhpsk^:^ — 
compares 1^ 

concaf: 

concatSTR: 

cQpy___ 

dascriba 

hajh 

ifCopybf: 
laE^aal: 
Isf^tiaiSTR: 
printOn: 

ttr 

fttrcat: 

Unknown 

ndxVarSIs 
ndxVarTypa 
hewrClaaa: Van: 
pnhtOn: 



onlOD; Text: 



capacity 
describa 

doesNoJle'isKnixe: 
HVarCapacity: 
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ApiDendix G 

Prototype Source Code 



LEX Specification File 



#ihclude ??6bjc.h" 
finclade "y^tab.h" - - - 
ld©fine_MONix) _j( yylvai :lex " 
- (N, Collection, Primitive) 

%% 

[fFjebC. 

IaA3pr(". 
[rnM]ay 
[jj]un(-. 
t jjjulr. 
taajugj!?. 
[sSjepC. 



return MONTH t ) 



Muary)? MONilj 
'Iruary)? M0N<2) 



•Ich)2 
'liD? 



MONO) 
M0N(4) 
M0N(5) 
M0N(6) 
MbN(7) 
MbN(8) 



Me)? 

•ly)? 

•Iiist^? 

Mteitiber)? MONiS) 
fo03ct(''.''|obBr)?- MONdO) / 
tnN]ov("."|ernber>? MON(ll) ; 

Member)? MPN( 12) ? 

( yylval.lex - yytextJO] 

return DIGIT ; 
{ ; /* delete blanks */ 
{ return EOL ; } 
{ return EOt ; ) 

( __ 

return (yytextIO]) ; 

# Incli-ide "atdio.h" 
int noleap [] - ( 



[dD]ec(". 
(0-9] 



) /* return single characters 



30, 31, 31, 30, 
30, 31, 31, 30, 



_ 0, 31, 28, 31, 30, 31, 

int leap C] - ( 

^ , , _ _ Oi^ 31, 29, 31, 30, 31, 

date ( month, day, year ) 

iiit_*dayain ; _ _ 

daysia_-_isleap ( year }? leap : hbleap ; 
if (month < 1 J L __inonth_>_12 )_ 

( printf { "month out of range \n" ) 
return ; 

\ 

if ( day <-l II day >• daysin [month? ) 

( printf ( "day of the month out of range\n" ) 



31, 3b, 31, j 



31, 30, 31, ) 



return 



is leap (year) 
{ if (year % 
':f (year % 
if (year % 



int wrap () 
{rt: 1.1(1); ) 



4 

100 
40C 



return 
return 
return 
return 



(0) 

(i) 

(0) 

(I) 



-56- 

§5 



YACC Specification File 



II 

llnclud* -objc.h" - - - 

i (M, CbUttCtlOD,^ Prlmltlv*) /• phyl* •/ 

a8t«rn id dat^Obj ; 

abort lox ; /• lexical cod* •/ 

id obj / /• 05>ct •/ 

i 

IStart _ pro9 
%toXen<lox> DZG27 MONTH 
f£oXen<lfiX> EOL _. _ 

%iype<obj> DateStnt 

1% ...... ^ . 

pro?! DateStmt EOL ( exitO; 1 ; 
DateStfntJ MONTH day ya»r 
I 

dat« ( 91, -52, $4 ) ; 

9$ - [dateObj mo: 91 da: 92 yr: 94 ) ; 
[ datebbj print ) ; 

J 

I day MONTH year 

*aite { S2,_91, $3J i.. 

99 - I dateObj n>o: 92 d»: 91 y- : 93 J ; 
i dateObj print ) / 

I number '/' n^f^S^r '/' number 

'if {5: ♦ 1900; 

date « .._ . 

99 - ; Jja: 9? yr: Sis J ; 

i . » ; 

) 

I nun^r * . 
f 

4ak: ■ ■.: - - - - _ . - 

9<i - i ' -3 da: -1 yv: .5? J 

I d,-:. . . ;<nt 3 ; 

I 

I num?jer 
1 

d^te n^.l, SV-l. __ 

99 V I dateObj nio: -1 da: -1 yr: 91 ) ; 
[ dateObj print ) ; 

I MONTH number 
I 

date ( $l,_-lr 92) ; 

99 - I dateObj root 91 da: -1 yr: 92) ; 
t dateObj print ) ; 

} 

I MONTH number 

( 

dat« ( SI, -1, 93 ) ; _ , 

99 - ( dateObj mb: 91 da: -I yr: S3) ; 
[ dateObj print ) ; 

) 



day: number 

y<iar: . number 

number: DIGIT . 

i number DIGIT 

I 59 - 10 • 91 ♦ 52 



iihclude "stdiO^h" _ _ 

yycrror (3) /• called for yacc syntax error */ 

char •is; 
I 

wamin9(a, (char *) 0); 

char *prpgnaroe"'**3tdin''; 
waming(s> tl, t2, t3, t4, t5, tii, t7,.-tS,-t9j /• print warning message •/ 
char •3/ •tl, *t2, -tB, •t4, -tS, •te, *tl , -tS, -tS; 

extern iht yyllnehb; 

/• fjarintfCatderr, "file %3: ", prognaroe) ; •/ 

fprint£(3tderr, a, tl, t2, t3, t4, tS, t6, t7, t8, t9); 
fprlntf (3tderr, " hear line %d\h", yylineno) ; 

) 
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Date Class Source File 



♦include "bbjc^h" . 
♦include "y.tab.h" 

char * MON n - { " "^"jin", "Feb", "Mai", "ftpi", "May", "Jun", 

"Jul", "Aug", "Sep", "Oct", "Nov", "Dec", } ; 
« Date: Object ( .Collection, Primitive ) 

{ int mon, dy, year / } 
- ino: (int) aMpnth da: (int) aDay yr: (int) aYear 
{ mon = aMonth 7 
dy « aDay ; 
year = aYear / 
return self ; 

i 

• print 

{ 

if (dy>0 inoh>0 ) 

print f("<DATE> %d %s %d\n", dy, MON[inon], year ) ; 

if (dy < 0 &5 inon<G ) 

printf {"<DATE> %d\n", year ) ; 

if (dy < 0 ss mon >0 ) 

printf {"<DATE> %s %d\n", MON[mdn], year ) ; 
/* ^ insert code for different type of prints to account for defaults*/ 
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Main Program Source File 



#include "stdio.h" 

#include_''bbjc.h" 

=» ( Coiiection^ Primitive ) 

extern BOOL msgFlag ; 

extern I0D yyin, yyout^ msglOD ; 

id dateObj ; 

ma in ( ) 
{ 

extern id Date, Set ; 



msglOD = stdout; 
msgFlag " NO ; 

dateObj = [ Date new ] ; 

print f (">" ) ; 
yyparseO ; 

printf ("end yyparse\n'') ; 

} 

eclass ( Date, Set,Cltn^ IdArray, Sequence ) 
Qphyla (N, collection. Primitive ) 
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Appendix D 

Merged File of Heterogeneous Bibliographic Gitadons from Six 
Database Sources 



<ACC ESS ION NO. >_ 85129022, _85B6> _ _ 

<DATAeASE. SOURCE> BRS/Notionol library ct Medicint DotoDOse 
<AQThORS>- E I I I aon-!-J-!-M; WhorM-E-A; 
^P^4^>CofBp ridge Hosp i t q^^^ Mossocnuset t s . 

<t|tLE> More then o gotewoyj^ the rola of the emergency psychiatry service 

__jlO _t_he_ connua I 1 y-neb t o I heoMh networM^- - - -- - 

<PUB DESC> Hosp-Common i t y-Paych i o t r y . 1985 Feb. 36(2). P 18B-5. 
<tANCUACE> EN.. 

<MA JOR CATEGOR¥> COMMDNI TY-MENTAL-MEAlTH-CENTERS : -og . - EMERGENG Y-S ER V I GE-HOSP I TAi 

EWERCENCY-SERVICES-PSYCHIATRIC; og. I NT ER I NS T I TUT I ONA t-REC AT I ONS . WENTAl-HEALTH-SERVI CES r 

og^ _ _ _ 

<MlNpR CAT£C0RY>_ADULT. BOSTON: _CASE-8EP0RX . CATCHMENX-AREA-HEalTH . 
CRISIS- INTERVENTION. FEMALE. b^SP 1 T AL-B ED-C APAC I T Y-3e0-TO-499. .HUMAN 
MACE. -MIDDLE^AGE, ROLE. SOC 1 A L-»*CPK . o r [n heiD.ng the emergency unit 
i ' ^ . c I oae r _r e I p t i gn^h ADa_« i c onmun i t >f _age be i es la jia contract with 
the atote to Derferm evoluotiona of oil odm i a a i on a_ t p the atote hoapiiol 
paycbiotf ic- ub i I - ae ry i bg t he cbtchment oreo. the emergency unit oerforma 
A • o9« ona prayf idea.bockup fpr__ihc_agencie3»_cdoidinolea - t be - mohogemeh t 
of mu I t i-ogency coaea . ond holda weekj y educotionol conferencea for og«ncy 
i t of t . __Daibg coae ejioinpj ea , the outhora iliuatrote how unit ond ogency 
atoff colloborote to ensure continuity of patient core. Author. 

<SB>-M 

<DATE> 1985. 

<ISSN> BB22-1597. 

<2n> 24 ^Tb? .567 .875. . 

<iM> 6506. _ 

<Ep> 85B4B4. 

<NO>-MH17582. - 

<ACCESSIOM_-N01>- SI 47675 / _ _ 
<DaTaBASE S0URCE> DTIC/drola-tr 

<IRaNSLaT IdN-DAtE> Mon Jui 1 13:33:43 PDT 1985 (489B98B23) 
<DOWNLOAD Da7E>_ MQb_ «}u 1 _ 1 1BM8:29 PDT 1985 (489066309) 
<DOWNLOaD FILE NAME> gote 
<rj EL0S-AND-GRdUPS>-17>2 

<ENTRY_CLASSIF1CATI0N> UNCLASSIFIED _ . 

<CORPORATE AUTHOR> BOLT BERANEK AND NEWMAN INC CAMBRIDGE MA 

<TITLE>PLURlBUSSAtELitE IMP DEVELOPMENT MOBlLE ACCESS TERMINAL NETWORK. 
<TITLE CI ASS IF ICAT I CN>_ UNCLASSIFIED 

<DESeRlPTlVE NOTE> OUARTERLY TECHNICAL REPT. NO. 33. 1 FEB-3B APR 84. 
<DATE>-MAY-^ 1984 
<PACINAT I 0N>_ 3BP_ 
<REPdRT NUMBER> BBN-5774 

<CONTRACT NQnBER> MDA9B3--BB-C-e3S3T-NBB039'-S1-C-B4B8 
<REPORT CLASSIFlCATjON>_UNCLASSIFIED 

<DESCR1PT0RS> ^SATELLITE COMMUNICATIONS: ^TERMINALS; NETWORKS; SHIPBOARD: 

_ ACCESS; MOBILE^ WOSK 

<DESCRIPTOR CLASSJF1CaTION> UNCLASSIFIED 

<IDCNtIFiERS> PLURIBUS SATELLITE. PACKET NETWORKS. AKPANET. GATEWAYS 

<iPENTiflER CLASS1FICAT10N> UNCLASSIFIED - - - - 

<ABSTRACT> THIS_QUARTERLY TECHNlCAu _ REPORT DESCRIBES WORK ON THE OEVELOPttEHT 

OF PLURIBUS SATELLITE IMPS; AND ON SHIPBOARD SATELLITE COMMUNICATIONS. 
_ .(AUTHOR) 

<AiBSTRACT CLASSlFrCATION> UNCLASSIFIED 
<1N1T1AL iNV£NIdRY> 12 

<liuitat;o;j codes> i 

. <SdURCE COpE> 66B1Be 
^DOCUMENT L0CAXiON> Nt|£ 
<CEOPOLniCAL CODE> 7.50B 
<TYPE-COrE> 4 
<ACCESS10N H0.> Mfei5BS 
<PATABASE SOUPCi:>_Dl ACQG NTIH. FILE C 
<RP:?OR t "lO. > <NTIS> DEB i!-?C-5t 7/XAO 

< riTi:£>^^oi4,PFOc»9a i hg of £3 i L ! ; rop:i : c Citotiona from DOE/RECON, NASA/RECON. 
_ 9n«1 DOO/0ROLS._Revi±i6r. t . . 

<^gTeOHf,> eoMjnger,.W. «. . ; viomoe I . V. E. ; Horriaon, I. MJrphy. T. 

<rUO DpC> LowreDca_Uvcrr:}r6 Notlobol Lob., Ca. : <Code> B68147B66; 9513B35 ; Deportment 

-UC. : UCRL-a9993-f9EV . « , COK.- -84 1 243- 1 -RE V . 1 
<:DATE> Aug 1 9S4 
<PG> 17p 

<LANCUaCE> £n§ i i n 

<DOCUM£NT TYPE> Cpbl ♦renc« proceed inn 
<PC> PC AB5/I.»F AG1 _ 
<JA>-CPAi : NSA.neO 
<C0_O?.T>iJBS:>-Uni ied S:b»«a 

^^^^ ^"^?r«^_3J^0.r^ J cjiJine i t^ormdt.i on meeting. Condon. UK. 4 Dec 1984. 
<a8STRaCT> We hdv% developed on interoctive. aelf-guided progrcm for the 
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Merged File of Heterogeneous Bibliographic Citations (continued) 



joint post-^o'oecssing of bibiiogropN4e-CitQt*ons from the fw^eroi infer'^otior 
centers ot-thc-Dcoortnenx bl EbergyrtOOE^^lOr-Ceobrlment of Oefense_ (OOD ) . 
pnd _the_Nptionql Aerpnog_tics ondSpoce_Aoi«inis ^ NASA) . > s . 99 f 9") 

is current ly-instoiied-on-4hs Intelligent Cotewoy Processorof the technology 
I n I 6 rno t X bit -Sy s leni- ( T I S/ICP) -b t the CbwrcDce. LiveroibEc Nol 1 bob i _ t bbb r b t b r y 

9nd.is_under eypluptionby the TIS user c9miiiunJ^ty froni rL^i^otc term i no Is 

by teiepnene dioi-up. over tYMNEt . ond the ARPA computern^twork^ Users 
0£i _ ibd i vjdubl ly _ bulbb£ized_ibr_butbabted_bccfiss_ lb specilic . i nibfinbt i bn 
cent era . pnd _uaestpnoprd c pmm o n d a f p r the d p w n I p o d i n 9 , c pm p i I p t i p n , 
ond oniine review of citotiona in o c ommon formot. Previously reported- 
pbitr^prbceasing coi^obil iiiea_r)Ove_becs further - e»DbQded._pftr9iliing:__(l)- 
onMne c|totion ^*view. cotagorizot|on^ ond oddition ot new doto el^ements; 
(2) diaoasembiy oho re— oaaenibiy 0I citotions; ( 3) s t o t i a t i c o 1 bhbiyaia 

of __ do to f j_iid_ C9n_t«n t a;__ (4) . c r oaercor r e I 9t.i0n.0f doto.l.ie.l«l._cpntentsl 

Ond (5) ^ onco r done e ge ne r o t i on ^ In oddition. the new two— ooss interpreter 
tbr-tbe.pbat-^prbccaaihg-prbgr'bn-pernljtsi-the^t bbbrevibted 
do to. i Ifl d. npiiea.intp Eng I. i ah_ npnee_pref e r red__by__eoch. ogencyj^_the_atptiaticoi 
onoiyaia of the oenaity ond compieteneasof dsto fielda inselectedseta 
bf-biblibgropbfC-ci tbtibhai-ihe--eiinxnbtxbn bf-redunobnt ciibtxbns. (using 
use r->sp«c i f lad c r i t e r i p) . pnd V^*!!.d._9.n9.l y .> i ^ I 9 1 1 * T . i ! 9. P9*'*.r * M ! 

tool for the expiorotfon of t i ma— dependen I chorocteristies Ino porticuior 
f i ild.bf.reieorcb. -bf _bh._brgobizbti 06^ . bt_f at . bh.authbr^.Ccbph Lcbl . dJ splbys 
of p u b I i c p t j p n r o t e e o a a f u nc t i p n o f t i m e p n d t e n o r m p I i z e d s t o t i a t i c s 
of tetAa used in the description ofthe work. can beused to signoinew 
d i rsc t ions . of __ongo i ng reaebrch Ond the tntehiity bf i te auppbi't. (ERA 
citotion 10:081706) 

<OESCR I PtORS> • I n f o irmo t i on : •Computer Networka; Informotion Retrievoi; 

. SPaciiicotiosa 

Kindexing Terma> ER0A/99e3ee; NTISOE 

<SH> 5B ( Be hbv i o r 0 1 - end Soc i b I ScJ ences^^Oocuhent bt 1 on ond Informotion technology) 
9B.( Eject ron i cs Q.nd_EJ • cl r i Co I Eng i me r i ngr-Cp»Pg t e r s ) : . 888 (Librpry 
ond Informotion Sc| ences——] n f o /mo t | on Sy s t ems ) ; 628 (Computers. Control. 
- -ofid Irii.b rmbtJbn.,Xheb r y^^Cbmpu t e r Sb f two re) 

3JACCESSlQN,JSl0*>734Ce.18B55f 

<DATA8AS£ SOURC£> OOE/recon _ 

KTRANSLATION OATE>-Hbh «lul - t 13:33; 43 PQT J 985 {489998^23 ) 
<OOWNLOAO OATE> Mpn Jul 1 10:18:29 POT 1965 (489686309) 
kDOWNLOAO FILE NAMES^ gote - _ ._ - 

<8EP0RT NO.PACE> 0CHt^^89995^Rev.l P. t 7 : DEBSeoeS 1 7 - 

<TI TLE(MONO)> Post-processing of b i b I i ogroph i c citotions from OOE/'?ECON. 

-NASA^RECONt ond-OdO/OROLS. Revision 1 - 
<E01T0S_0R COMP> Bb J L i nge r . H . A ~ ; _ Hompe I , V . E . * Ho r r i son . I.; Murphy. T.P. 
<CORPORATE AUTH> Lowrence Livermore Notionol Lob.. CA (USA) 
<CdRPdRAtE CbdE> 9513035 

<TtPE>_R _ _ 

<SEC REPT N0> C0NF-B41243 — 1-Rev.1 

<PACE N0> 17-- -- 

kAVAJLAB I J.J ty> NTIS^ PC A02/UF A01 . 

<ORO£R NUMBER> 0EBSee06l7 

<C0N7RACT HO>..Cohtroct W-7405-ENC-<-4B 

<CONF TITLE> 8. i n t e r no t i ono I online informotion meeting 
<CONF PLACE> London^-UK 
<C0NF-0ATE>-4_0oc 1984 
<OATE> Aug 1984 
<C6 OF AUTH> US 

<C0_ OF_PUBt> US_ _ _ 

<ANN J> ERA-ie :e0?706 : E08-84 : 188555 
<diSTRiiBUt t dN> MN-32 
^DOCUMENT 0RICIN> P 
<8IS> TIC 

<CAtECdR!iS> EdB«99030© - 

<PSIMASY CAT> EDB-9903ee{^GENESAi: ANO.UISCECCANEOUSl INFORMATION HANOt IMC) 
<A6STRACT> _ Ms__bove deve I oped on i n t e r oc t i y e . ae| t --gu i d the 
joint post-^procesaihg of bibliogrophic citotlona i^om the fcderoi 
.ioformotjon ceniara.of ihe Oeporinient.. of. Energy (DOCl. thc_Oepo£tm«nt bf 
Defense (000). ond the Notionol Aeronoutics ondSpoce Adniniatrotion 
^NASA ). this prbgibmiscurrently instoliedon thcinicliigent Cotewoy 
Proceaaor ol_tbe_tecbnoJogy_ln.fornioiion Syatem (TIs/lCP^ ot_the_tbwrehcc 
Livermore Notionol Loborotory ond is under cvoluotion by the TIS user 
community fron remote tcrminola by lelephbiae dibl^up. over tYMNEt. ond the 
ARPA compuicf .netifor It ,..Uafira..Qre.indlyidup.llypuihOfjzed Ibf.buibmbied. . 
o c c e s s t o s p e c i f i c I n f o r m o t i o n c e n t e r s . ond u s e s t p n d p r d c omm o n d s f p r the 
downioodin^.compi lotion, ond online roview of cjtctiona in o common 
f9rnip_t^..Previoualy_repprte<|.poat*prpcesaing_coe»Obililiea_hOvebfle 
expanded. permitting: (I) online citotion review, cotegorizotlon. ond 
bd'jxiioh of new doto elements: (2) bisbssembly ond r e— oa aemb i y o f 
vitoti&na: tZ) atotisticol pnplyais of doto field contents: (4) . 
c r o«s— CO r r « I o t * on of doto field contents; ond (5) concordonce generotiw^, 



-61- 



Merged File of Heterogeneous Bibiiographic Citations (continued) 



iQ.Qddition^ tbe_.D«w_ iwo-'DQ ss. ioiecpj^elecfor. tne post^processib^.pEboror 
P«rmits;the tronsformotion of obbrevioteo doto fieio nomes into engiisr 
nones Prcfarrsd byedch agency . the stotisticoi onoiyS4S of the oensity 

0?o com£leteness.of.oQto.fi«Jd9_Jn_seieciea_seS9_bf bit>liO9roQb±c_ . 

citotioi^s. the elimi^notionof redunoont citotion9 ( using user — specified 
t: r. 4 t e r i o ) « _ obd f r end _obo X y s i s i . The Ibtter is o powerfui tooj for the 
• M pi P_r P t • on _ of t i nie— de P« nd.e n t _c ho r pc t e r i _s _t i c » _ i n o Po r t j c oJ P r _ f i e 1 d__ o f 
researchi of on orgonizotioni or foron outhor. Grophieoi dispioys of 
publicotibh r6(«9 bi.b-fubctjbb-bf . f imebnd - th«-b6jrn6i ized-Stol iat ics of 
t o rm s V sed i n t he d e sc r i p t i on. o f t h e wo r k . c on be u sed t p a i g no I new 
-d^^eciiensof-ongoing-reseorch and the intensity of its supper t. 

<r>ESCRIPTDBS> «1NF0RMATI0N— cbriputer he t wb r k i ; 1 NFORMAT I ON RETRIEVAL: 
SPECIFICATIONS 

<4SSUE> 8423 - 

<DDCUMENT N0> 84il8B555___ 

ACCESS lOHUlN0^'^a4Cer7369I 

<DAT ABASE SOURCE> DOE/redbn- 

<TRANS;.^TI0N DAT£> Mon Jul 1 13:33:4;5 PDT 1 985_ < 489 098023 ) 
<bOWNLOA(' DATE> Mon Jul 1 10:18:29 POT 1985 (489086309) 
cDOWNtOAD FItE NABE> goie 

<RE/?P^. :_NO. PACE> yCRL--?1383 P, 1 0 ; D £8500 1 74 1 

<i I iCiu6uO)> Integrotlon of on outomo^ed librory «u^.port system with on 
_ ir^fiil_i_gen-t-gbtewby__ 
<eD)^.^ OR COMP> Burton. H.D. 

cCCr^PORATE AUtH> Lawrnttee Livern^/* Kftttonol ..ob.. CA (USA) 
<CORP0RATE CODE> 9513035 
<TrPE> R 

<SEC-R£PT Nb> CdNF-84091.'^ — 1 

<PACE ND> 16 _ -- 

<AVAI LABI L1TY> WTIS. PC A02/MF A?>V 

<&Rd€R MUMBtiV> bE85d01741 - 

<CDNTRACT N'J'> Coolrccl W'T<05 -ENCr4S 

<COHF TITt£> Integrator'' online Kbrory aya4«*nie conference 

<C3NF PLaC£> Atlonto -G4 . USA 

<CONF_DATE> l3_S«p l9S4 

<DAT£> Aug 1984 

<C6 OF AUTH> OS 

<C0_ OF _PUBL>_US _ 

<ANN J> £09-84:173691 

<b4STR4BUTlON> MN-32 

<DOCUMENT ORlCtN> P 

<B1S> YIC 

<CATEC0R1ES> EDB-990360 - 

<PR4MARY CAT> £08-990300 ( GENERA L AND M 1 SCE L LANEQUS ; INFORMATION HANDL ING)_ 
<ABSTRACT> A new project of t he technojogy lnfermotionSystem (tiS) ot the 
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